Skip to content

build(deps): bump sglang from 0.5.2 to 0.5.10#5927

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/sglang-0.5.10
Closed

build(deps): bump sglang from 0.5.2 to 0.5.10#5927
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/sglang-0.5.10

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 8, 2026

Bumps sglang from 0.5.2 to 0.5.10.

Release notes

Sourced from sglang's releases.

v0.5.10

Highlights

  • Piecewise CUDA Graph Enabled by Default: Piecewise CUDA graph capture is now the default execution mode, reducing memory overhead and improving throughput for models with complex control flow patterns: #16331

  • Elastic EP for Partial Failure Tolerance: Integrate Elastic NIXL-EP into SGLang, enabling partial failure tolerance for DeepSeek MoE deployments — when a GPU fails, the system redistributes expert weights and continues serving without full restart: #19248, #17374, #12068 blog

  • GPU Staging Buffer for PD Disaggregation: Gathers scattered head slices into contiguous memory for bulk RDMA transfer, reducing RDMA request count on GQA models by ~1000x. TPS/GPU on large concurrency increased by ~5x with Prefill TP4+Decode DEP4 on Qwen3.5: #19890

  • HiSparse for Sparse Attention: Integrate HiSparse sparse attention backend for efficient long-context inference with reduced compute through sparsity-aware attention: #20343

  • SGLang-Diffusion Update:

    • Model support: LTX-2, Hunyuan3D-2, Helios
    • Performance improvements on Qwen-image, Z-image increased by 1.5x
    • New platform: macOS
    • New feature: enhance the performance of diffusers backend by integrating all optimization from Cache-DiT
    • SKILLs: feel free to explore the curated skill for developing and optimizing sglang-diffusion!
  • FlashInfer MXFP8 Kernel Support: Integrate FlashInfer mxfp8 kernels for GEMM and MoE operations, enabling mixed-precision FP8 inference with higher accuracy through microscaling for RL and general workloads: #19537

  • Transformers 5.3.0 Upgrade: Major upgrade from transformers 4.57.1 to 5.3.0, unlocking support for the latest model architectures and features from HuggingFace. GLM-5 model is now supported in this image instead of the custom built image: #17784

  • DeepSeek V3.2 / GLM-5 Optimization: GLM-5 runnable on main branch (with upgraded transformers). Fused Triton kernel for prefill KV cache fetching, NSA fuse store indexer for K cache, TRT-LLM prefill/decode DSA kernels as default on SM100/SM103, and IndexCache for improved throughput by more than 10% on high workloads: #19319, #19148, #20062, #21914, #21405

  • Qwen3.5 GDN/KDA Optimization: Transpose linear attention state layout from [N, HV, K, V] to [N, HV, V, K] and fuse split/reshape/cat ops in GDN projection with Triton kernel, plus CuTeDSL KDA decode kernel support for improved Qwen3.5 performance: #20283, #21019, #21203

  • LoRA Support for MoE Layers: Add LoRA fine-tuning support for Mixture-of-Experts layers with JIT alignment kernels, fused Triton kernels, TP support, CUDA graph support, and auto-detection of LoRA target modules — enabling efficient adapter-based tuning on MoE models like DeepSeek: #19710, #19711, #14105, #21439, #21647

  • Prefill Context Parallel for MHA (Qwen3): Enable context parallelism during prefill for multi-head attention models like Qwen3 MoE, distributing long sequences across GPUs to reduce per-GPU memory and accelerate prefill: #18233

  • Flash Attention 4 Official Library Support: Upgrade to the official FlashAttention 4 package, bringing the latest attention optimizations and Blackwell GPU support: #20303

  • Skip-Softmax Attention for FlashInfer TRT-LLM Kernels: Reduce computation overhead in attention layers by skipping redundant softmax normalization: #19089

  • Speculative Decoding with FA4 Backend: Enable speculative decoding for the FA4 attention backend, combining speculative inference with next-generation flash attention for faster generation: #21080

  • MM Attention FA4 Default on SM100: Multi-modal attention now uses FA4 by default on Blackwell hardware for improved VLM performance: #21595

  • Stronger Transformers Modeling Backend: Enhanced transformers backend with full TP, PP, MoE, VLM support, and torch.compile compatibility: #19163

  • sglang-kernel 0.4.1: Major kernel package release with renamed package (sgl-kernel → sglang-kernel), consolidated kernels, and cleanup of deprecated ops: #20440, #22009

  • Native MLX Backend for Apple Silicon: Add native MLX execution backend enabling SGLang to run inference directly on Apple Silicon Macs without CUDA: #20342

New Model Support

  • Nemotron-3-Super (bf16/fp8/nvfp4): #20407, cookbook
  • Mistral Small 4 (Pixtral): #20708
  • LFM2-VL (Liquid Foundation Model 2 Vision-Language): #21230
  • Voxtral (speech-to-text): #21635
  • GLM-5: Supported on main branch with transformers 5.3.0

... (truncated)

Commits
  • 1519acf [Hotfix] Fix router gemm on sm103 (#22134)
  • c1927e1 fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649)
  • 07f57fc Enable IndexCache for DeepSeek V3.2 (#21405)
  • 164bc0a [Fix] Fix nightly tests (#22140)
  • 43654ef [diffusion] CI: improve diffusion comparison benchmark setting for realistic ...
  • 1ad6839 [Feature] Add Reasoning Tokens Usage (#15562)
  • bf984ae Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoi...
  • 46bf19c chore: bump flashinfer version to 0.6.7.post2 (#22097)
  • 2476325 [Speculative Decoding] Add FA4-based Spec Support (#21080)
  • 34d5765 [VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer ...
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [sglang](https://github.com/sgl-project/sglang) from 0.5.2 to 0.5.10.
- [Release notes](https://github.com/sgl-project/sglang/releases)
- [Commits](sgl-project/sglang@v0.5.2...v0.5.10)

---
updated-dependencies:
- dependency-name: sglang
  dependency-version: 0.5.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 8, 2026
@wuxibin89 wuxibin89 closed this Apr 9, 2026
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot bot commented on behalf of github Apr 9, 2026

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot bot deleted the dependabot/pip/sglang-0.5.10 branch April 9, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant